Skip to content

DOC-12487 Maintain durable writes #3810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: release/8.0
Choose a base branch
from

Conversation

ggray-cb
Copy link
Contributor

@ggray-cb ggray-cb commented May 21, 2025

This PR incorporates changes for the Morpheus feature MB-43068 Maintain Durable Write Availability after losing replica

Changes in this PR (links lead to preview site. See this page for username & password)

Copy link

@BenHuddleston BenHuddleston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, mixing the rewrite of the existing docs and the new feature makes this a harder review

@@ -108,6 +108,21 @@ and limits the number of metrics to 100.
Additional information sent by clients at connection time can be found in the logs.


[[new-feature-800-maintain-durable-writes]]
https://jira.issues.couchbase.com/browse/MB-43068[MB-43068] Optionally Maintain Durable Writes During Single Replica Failovers::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this title accurately reflects what this does. Things do not behave differently if there is a "Single" failover, or more. "Replica" is somewhat overloaded as we don't fail over replicas. It also does not state what we "Maintain".

Suggest:
"Optionally Maintain Durable Write Availability Without Majority After Failover"

@@ -108,6 +108,21 @@ and limits the number of metrics to 100.
Additional information sent by clients at connection time can be found in the logs.


[[new-feature-800-maintain-durable-writes]]
https://jira.issues.couchbase.com/browse/MB-43068[MB-43068] Optionally Maintain Durable Writes During Single Replica Failovers::
In a bucket with a single replica, you can enable an option named `durabilityImpossibleFallback` that allows durable writes to succeed even when they cannot meet their majority requirements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not mention the number of replicas

https://jira.issues.couchbase.com/browse/MB-43068[MB-43068] Optionally Maintain Durable Writes During Single Replica Failovers::
In a bucket with a single replica, you can enable an option named `durabilityImpossibleFallback` that allows durable writes to succeed even when they cannot meet their majority requirements.
This option is off by default.
This is a temporary setting to allow clients to continue to write data when nodes are unavailable due to failovers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "temporary" about this setting? I don't think that we have any intent to remove it, and it does not turn itself off.

In a bucket with a single replica, you can enable an option named `durabilityImpossibleFallback` that allows durable writes to succeed even when they cannot meet their majority requirements.
This option is off by default.
This is a temporary setting to allow clients to continue to write data when nodes are unavailable due to failovers.
For example, you can enable this option while you're performing an upgrade using the graceful failover followed by a delta recovery method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example applies only to the single replica case. If/when you remove the previous references to a single replica, you may wish/need to add that here.

Once the write has been committed as specified by the requirements, Couchbase Server notifies the client of success.
If commitment was not possible, Couchbase Server notifies the client of failure; and the data retains its former value throughout the cluster.
After a write meets its durability requirements, Couchbase Server notifies the client of success.
If the write does not meet the durability requirements, Couchbase Server notifies the client that the write failed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace "If the write does not meet the durability requirements" with "If the write cannot meet the durability requirements".

To me, "does not" could mean that we find out after attempting it, after which we will not return that the write failed, we will return the ambiguous response.

.Potential Data Loss
====
Enabling `durabilityImpossibleFallback` degrades the promise that durable writes offer: that Couchbase Server has persisted the data in a way that should survive node failure.
When enabled for a bucket, this setting makes durable writes to it during a replica failover no more safe from data loss than regular asynchronous writes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double space "setting makes"

.Potential Data Loss
====
Enabling `durabilityImpossibleFallback` degrades the promise that durable writes offer: that Couchbase Server has persisted the data in a way that should survive node failure.
When enabled for a bucket, this setting makes durable writes to it during a replica failover no more safe from data loss than regular asynchronous writes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before on "replica" failover.


Overrides Couchbase Server's default behavior when it cannot meet a durable write's majority requirement.
When set to the default `disabled` setting, Couchbase Server reports to clients that a durable write that cannot meet its majority requirement has failed.
It also rolls back any data changes by the write across all nodes in the cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not roll back anything. This check is performed before even attempting to contact a replica based on the configuration that the cluster manager passes to the data service.

Overrides Couchbase Server's default behavior when it cannot meet a durable write's majority requirement.
When set to the default `disabled` setting, Couchbase Server reports to clients that a durable write that cannot meet its majority requirement has failed.
It also rolls back any data changes by the write across all nodes in the cluster.
If you set this value to `fallbackToActiveAck`, Couchbase Server reports the write as successful even if it could not meet the majority requirement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double space "the majority"

Overrides Couchbase Server's default behavior when it cannot meet a durable write's majority requirement.
When set to the default `disabled` setting, Couchbase Server reports to clients that a durable write that cannot meet its majority requirement has failed.
It also rolls back any data changes by the write across all nodes in the cluster.
If you set this value to `fallbackToActiveAck`, Couchbase Server reports the write as successful even if it could not meet the majority requirement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also somewhat ambiguous/incorrect, it's still possible to see the ambiguous response if the replica is configured but the write times out for some reason.

Copy link
Contributor

@hyunjuV hyunjuV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggray-cb
I've looked at the changes in this PR, and Ben Huddleston's comments. I do not have additional comments.

@BenHuddleston
Thank you for reviewing!

Copy link
Contributor

@rao-shwe rao-shwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ggray-cb
I've completed a round of editorial review. But it seems like technical review implementation is pending. Once that is complete, I'll do another round of editorial review.


This form of write is referred to as a _durable_ or _synchronous_ write.
Couchbase Server supports durability for up to two replicas.
It does not support durability for buckets with three replicas.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:
"...with three or more replicas."

Such a write may be appropriate when saving data whose loss could have a considerable, negative impact.
For example, data corresponding to a financial transaction.
* A durable write is synchronous and provides durability guarantees.
Use this type of write for data where loss could have significant negative consequences, such as financial transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:
"..where the loss could result in significant..."

For a write to be durable, it must meet a majority requirement.
The majority requirement is based on the number of replicas defined for the bucket.

The following table shows the majority requirement for each replica setting:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the extra space before "majority".

|===

[WARNING]
====

In consequence of the correspondences listed above, if a bucket is configured with one replica, and a node fails, durable writes are immediately unavailable for any vBucket whose data resides on the failed node.

As shown by the table, if you configure a bucket with one replica and a node fails, you cannot perform durable write for any vBucket whose data was on the failed node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be changed to:
"... you cannot perform durable write for any vBucket data that was on the failed node."

[[maintaining-durable-writes]]
== Maintaining Durable Writes During Single Replica Failovers

As described in <<#majority>>, a bucket with one replicas must meet a majority requirement of two nodes for a durable write to succeed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:
".. a bucket with one replica must meet..."


For information about the `durabilityImpossibleFallback` setting, see xref:learn:data/durability.adoc#maintaining-durable-writes[Maintaining Durable Writes During Single Replica Failovers].

You can modify this parameter for existing buckets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to:
".. for the existing buckets".

[[new-feature-800-maintain-durable-writes]]
https://jira.issues.couchbase.com/browse/MB-43068[MB-43068] Optionally Maintain Durable Writes During Single Replica Failovers::
In a bucket with a single replica, you can enable an option named `durabilityImpossibleFallback` that allows durable writes to succeed even when they cannot meet their majority requirements.
This option is off by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be changed to:
"This option is off by default."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants